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ABSTRACT 

To conceptualize a reader's comprehension of text as 
a"s,emantic and interpretive prccessing of information, it is 
n"ecessary to take note of interactions among persons an*^ texts and 
/conditions under which the texts are to' be comprehenaed - A 
.Computer-Assisted Language Analysis System (CALAS) was- constructed 

which focuses on the text as any interpr etable record of the 
*: employment of a language. In caking its interpretations, CALAS^ 

utilizes a model ofrthe English language , which imputes to its texts 
Cthe properties of a syntactic and semantic grammar. Data analysis is 
"accomplished in three stages: (1) analyzing the text to identify each 
/word in sequence in terms of its grammatical equivalent; (2) 
^.gathering the individual words into phrases, which are again 
"identified in terms of theii graramatic :al equivalents; and (3) 
- gathering »ph rases into clauses, with the component phrases displayed 
^within each clause and identifying the phrises within a clause in 
.' terms of the roles each plays within the clause- This macro/micro • 
^'analysis of text invites identification of and ccmparison among texts 
in terms of their structural properties. (HOD) 
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Computer-simulation of cognitive processes in humans, offers a stay^tling 
vision of "artificial intelligence" in machined. The work has captured pub- 
lic attention with claims that are often exciting, unsettling, and overstated 
(Restak, 1980). A contemporary vogue in research on reading draws, upon thi^ 
effort in attempts to portray the complex processes^ by which people select, 
'interpret, and subsequently retrieve the information conveyed to them by. 
texts. The underlying fdea ts that readers comprehend and remember texts by ^ 
recour:se to hierarchical ly-orderec^ schemata, (developed out of \ior exper- 
ience (Adams & Collins, 1979)\ Much cf what is reported, understates the 
case for individual differences among persons in their development ^nd use 
Qf any such schemata. • . • 

The research of ccfl leagues like Bonnie Meyer (1980) and Bruce Durin^('1980) 
confronts us with the necessity of accounting for. the fact that people do 
differ in their comprehension of texts. Meyer 's(^t)te l)and Dunns^s (Note2) - 
research points up the added relevance of accounting for differences among 
texts to be comprehended and among conditions..under. which any such comprehen- 
sion of text is invited to occur (cf. DeStefano, 1978, and Scribner, 1979, 
on sociocul tural conditions to be accounted for). However tempting it may . 
be for us to^nceptual ize a readef^s comprehension of text as a semantic 
and interpretive processing of information (Adams & Collins, 1975), it is 
equally necessary to take note of interactions among persons and texts and 
conditions under which the texts are to be comprehended. At the very least, 
the investigation of "reading comprehension" calls for a strategy of inquiry 
that is multi-faceted. 



' ■ A Differentiation of Texts 

^ ^ ,2'. 

"Multi -faceted" in this^ sense implies that what goes on in people's 

heads as they read or what is in the texts themselves to be comprehended 

are necessary but not sufficient conditions of reading comprehension and 

that we are warranted in taking account of individual differences in each 

as wen as in their interactions. Having belabored that point, which I 

think we need to be reminded of, I want to focus the remainder of my remarks 

on the important problem of differentiating among texts to be comprehended 

as "multi -faceted" in its own right. Again, Bonnie Meyer (Notel) gives us a 

clue as to dimensions of this problem in describing her construction of a 

system in which texts can be analyzed in terms of their structural properties 

at a top, ^middle, and bottom level of analysis. Fortunately, she is wi'th us 

to speak-more eloquently for herself on this symposium. 

^1 - • 

What strikes me is that there are critical respects in which our views 

of the problem are consistent with each other, A first is the assumption of 

levels at which texts tan be analyzed, ranginq from a more global, macroanalyt 

perspective to one that js more narrowly focused, he,nce more microanalytic- A 

second is the presumption that structure can' be imputed to te3rts-at any of the 

levels into which they may be composed. At the risk of putting words into ' 

Dr. Meyer's mouth, which she may not wish to utter,' let me suggest that for 

both of us the concept of "structure" postulates the "existenbe of named 

classes of phenomena and their relations to. each other. In the domain of 

analyzing texts where I am most at home, namely that of clauses and larger^' 

blpcks of main and subordineite clauses, essential structural Ingredients " 

are postulat^^d to exist in the form of noun phrases as named things and of 

yerb phrases as things that define relations between noun phrases, ^ 

In the remainder of this paper, I shall describe briefly th,e system 

for analyzing texts from a microanalytic perspective, at once the primary 

.source and object of my remarks about text on this occasion. Elsewhere, I 
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hava provided the foundation for a theory of meaning in the context of whT^^ji, 
the system has been constructed (Pepinsky, Note 3), so Til but touch uDon 
it here. The idea is that when people employ language in communication with 
each other, they are both (a) governed by impl icit^ 1 inguistic rules— which 
make possible a common sensing and understanding of things— and (b) are prone" 
to innovate and/or modify their linguistic rules— which enaole'them to en- 
hance their sense of common understanding. Hence, in their communication, 
people are inferred to be both structured and structuring. Their language 
itself is thus identified as a system of formulations which enable people. to 
make evident to each other their a'ccess to information in a mode of conimuni- 
cation ("Pepinsky & Patton, 1971). Based on these principles, A Computer- . 
'Assisted Language Analysis System (CALAS) has been constructed. CALAS 
centers our attention on text as any interpretable record of the employment 
of a language. The texts of eventV like counseling interviews or talk in a 
classroom afford familiar and pertinent examples of such records. .Given 
present technology, printed transcripts, as the text of spoken English, may 
be fed as inputs dfrectly into a computer, which, when properly instructed, 
promptly reads and interprets their contents. 

In i;^ak1ng its interpretations, CALAS utilizes a model of the English 
language, which imputes to its texts the properties of a syntactic and a 
semantic grammar. The syntactic grammar presupposes the text to occur as- 
a string of words, such that each word- or cluster of words may be assigned 
a grammatical label in* terms of the slot it occupies and the purpose it 
serves in an brUered sequence. In contrast, a semantic grammar attributes 
relationships to component ^parts of the sequence, apart from the order in 
which they appear. These linear and non-linear structures imposed upon ' 
texts by the model comprise a metalanguage, which incorporates the rationale 
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and methodological features of a computational psycholinguistics (Pepinsky, 
Note 3). Today, I may but outline briefly hoyy this is done (for a fuller 
description, along with a partial list of earlier research^studies in which 
it has been employed, see-the GALAS Manual : Pepinsky, Baker, Matalon, May, 
Staubus, Note 4). 

For its computer operations, GALAS relies upon a series of .four pro- 
grams of language analysis which, along with their implementing rules, make \ 
use. of two programming languages: SPITBOL and PL/I. These programs, designed 
to be rdn on an IBM System 37G/Model 168 Gomputer, have been adapted for use 
on certain other computers such as our present Amdahl system. But that is 
only^art of the story. By, design, GALAS also includes human editors who, 
according to instructions, assist 1^he computer and its human programmer in 
the processing of data. As its raw material, GALAS ingests "machine-readable" 
text, which has been key-punched onto cards or tapes from original text, 
"or transcripts of speech. 

With these resources, data analysis is accompli shea in three 
stages. It is in Stage 1 , called EYEBALL, that GALAS makes the syntactic 
analysis of text, identifying each word in sequence in terms of 'its grammati- 
cal equivalent, e.g., noun, verb, adjective, adverb, preposition. GALAS does 
this by reference to a small dictionary (of approximately 600, mainly- "function 
(Fries, 1952) words) and, as importantly, a set of rules for identifying other 
words in terms of where in sequence each appears and what role it is thus 
intended to" play. Where alte-rnative rcles are plausible for a word, the 
computer is programmed to list these in the order of their most likely 
occurrence for that word in that slpt, e.g., as adjective, verb, noun. At this 
point, one or more humans (we recommend a't least, two persons) edit the text 
according to instructions, rapidly correcting the relatively few evident 

♦ _ - 

"errors" made by the computer. That editing is an important addition to the 
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process of analysis because, as will become apparent, earlier errors become - 
compounded in later. stages of analysis. 

The edited. output of EYEBALL becomes input for a second stage of 
analysifi called PHRASER/ Here, guided by a seconu set of programs, the 
computer aggregates the individual word^ into phrases, which are again 
identified in terms of their grammatical "equivalents , e,g,, as noun phrases, 
verb phrases, adverbial phrases, prepOF/itional phrases; also, as conjunctions 
and subordinators (i.e., tenis lat introduce main an(|' subordinate or partial 
clauses). PHRASER." 1 ike EY£?. , is then edited, and the system is ready for 
its third and--at present— f: '.dl stage of analytic display. 

Stage 3 in the analysis of data processed by CALAS is called CLAUSE/CASE 
At this stage, the computer is instructed to do three things, and by refer- 
ence to a third set of programs- First, "pHrTses are, aggregated into clauses, 
with the component phrases displayed within each clause. Second, the phrases 
within a clause are identified in .terms of the roles each plays within the 
clause. Because by definition a .clause contains one and only one predicate- 
notably, a verb phrase— the verb phrase itself becomes an essential feature 
of the clause, with other phrases as optional ones, v^^b ^phrases, then, are 
identified as particular types, basically as verbs of state, action, process, 
or action-process; and secondarily as* compounds of experiential or benefac- 
tive states or actions. Noun phrases that accompany the verb phrase are 
thus identified in terms of their case .roles within a clause, as objects, 
agents, experiencers', or beneficiaries of a state or activity. Finally,. 
the clauses themselves are exhibited to display a main or independent clause, 
pilong with clauses subordinate to it and in the order of their embedding 
within the block of clauses. 
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In thus applying -our linguistic system to the analysis of language 
used in Interactive talk, which we've been doing since 1974 (e.g., Bieber, 
Patton, & Fuhriman, 1977; Hurndon, Pepinsky, & Meara, 1979; Meara, Shannon, 
& Pepinsky, 1979; Patten, Fuhriman, & Bieber, 1977), two major kinds of 
information have been quantified to date. The first includes content 
measures: prominently, of the relative frequencies- with which the different ' 
types of verb phrases are used. The second includes measures of structural 
or stylistic complexity: .again, prominently, the ratio of the total number 
of clauses to main clauses and a measure yf the average embeddedness of 
clauses within blocks. It seems stfange to be telling you here that this 
microsystem of analysis, like the macrosystems we are using (DeStefano, 
Pepinsky, & Sanders, Note 5; Pepinsky, DeStefano, & Saftders, Note 6; cf.'Halli 
day & Hasan, 197 6; Sinclair & Coulthard, 1975), is as easily applied to 

k 

texts already in written form. That is because impetus for the technical 
development of CALAS came from the need for efficient, effective methods 
of indexing and abstracting scientific and technical or other written 
documents -(Rush, Pepinsky, Meara, Landry, Strong, Valley, & Young, 1973; 

Strong, NotE7). After suffering for intervening years the slings and arrows ' 

I 

of outragfeous conversations that we have been attempting to analyze, it's a. 
'relief even to contemplate dealing with the expository or narrative prose 
of texts originally designed to be read and cojnprehended in that form. 

A noteworthy feature of CALAS is th^t it invites identification of and 
comparisons amo'hg texts in terms of their structural properties. To repeat 
myself, I mean by "structure" the specification of things named and the 
designation of how th^y are related to each other. From ttie relatively 
microscopic perspective of CALAS, again, the things named are grammatical 
surrogates for words called noun phrases; the relations between them are 
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specified by another^class of grammatical surrogates called verb phrases; 
more peripheral relationships may be found in terms of such things as_ 
adverbial or^prepositional phrases. 'These and other- features of a case 
graimar (ours is patterned after that evolved by Cook, 1979, who synthesized 
the earlier work of Fillmore, 1968, and Chafe, 1970j^, have been drawn upon 
for the purpose of interpreting texts and of differentiating among them as 
structural phenomena. . On^ of my former colleagues. Sue Strong (1974), niceJy 
extended my proposal (Pepinsky, 1974) for thiis -viewing and comparing texts 
at empirical, analytic, and formal levels. of display/ She then proceeded to 
outline a series of steps for translating texts thus analyzed into two-' and 
three-dimension'afl graphic forms. Accordingly, named things could be 
represented as nodes and relations among them, as connecting lines of various 
^^pes and slopes. She then demonstrated how the idea of names and relations 
embgjlied in informattonal blocks of clauses ("sentences") could be extended 
to include those embodied tn still larger segments of text. 

I regret that a change of jobs, though it benefited Sue Strong, edso 
made it necessary for her to abandon her promising research. Stimulated by 
the work of Bonnie Meyer (rbtel) and others on micro-/macroanalyt1c scheTiata 
for interpreting te:$ts, however, I have returned wUh enthusiasm to Strong's 
(1974) long-neglected proposal for integrating research on texts at various 
levels of analysis. The concepts of names and^ relations form a cornerstone 
for inquiry along these lines. 

[ In my opening remarks, I suggested that Bonnie Meyer's (Notel ) rationale 
and methods for analyzing text similarly presumes the existence of named * 
phenomena and relations between them, at successively more global levels of 
analysis. There is a bonus to be heid for viewing texts in this manner.. 
Namely, it becomes possible to postulate for all interpretable texts the 
existence of named phenomena and relati'6ns between them that render the 

>- 
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texts -Isomorphic to each other by virtue of structural properties that they 
possess in common. Moreover, it becomes possible to specify for any g^'ven 
text peculiar attributes of structure and_ content-^ that set it apart from 
any other text. 

There is a methodological problem lurking in all of this, which in 
conclusion, Td like to call to your attention. In my experience, it has 
become a truism that the most richly meaningful harvest yielded by analyses 
and differentiations among texts also^^emands the most highly skilled, 
-knowledgeable, and otherwise thoroughly indoctrinated of human raters. The 
same principle holds for the most globally inclusive purviews , i .e. , the most 
encompassing of entire texts, and of all that is^ implied--Tinguistically and 
paralinguistically--when people are understood to communicate with each other 
by means of ^natural language. Conversely, the most rioorously specified, 
reliable, evidential, and replicable kinds of analyses Lrk also the -most 
trivial and the dullest, and thd? least related to events that are - 
■'environmentally probable" (Brunswik, 1-95^ in everyday life. The trick 
is to learn the most about texts and with the least amount of self- 
deception, in describing and differentiating. among them- I should hope that 
increased attention to the structural properties of texts, -via a specification 
of their constituent features as named classes of phenomena and relations 
between them* would make our differentiations among texts more amenable 
to sensible talk about what readers are being exposed to. Above all, 
I should ^hope that multi -faceted inquiry would be enfcouraged so as to keep 
things both interesting and Txpli cable. 
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My remarks in this paper presuppose a. major requirement for students of 
reading to be the persistent one of dif-f^erent-iating among texts to read 
and digested—treating these, if you will, as stimulus conditions whose 
systematic manipulation can afford us a clearer picture of what it is that 
readers are being invited to comprehended, I have proposed for this purpose 
the prior task of identifying and categorizing texts in terms of their 
structural properties, essential features of which are postulated to exist 
as names that can be imputed to things in the text and as. relations among 
those named things, I have proposed further that structural elements of this 
kind can be identified concurrently at microscopic and macroscopic levels of 
analysis, rendering the varieties of analysis as much as the varieties of 
text to be analyzed--isomorphic to one another by virtue of their common 
structural properties. Examples are the Computer-Assisted Language Analysis 
System (CALAS), a microanalytic system described briefly in this paper, and 
the macroanalytic system which Meyer (Note 1) and Dunn (Note 2) will now 
proceed to introduce and discuss on this* symposium. 

My concluding remarks about the ^ains and opportunity costs to be realized 
in choosing one over another mode of analyzing text, can be extended to en- 
comp^ass the larger problem of determining what and how people comprehend wiien 
they read, Givci th^ state of'the art, this is no time for restricting our 
purview, ^hat I have elected to press for here is a systematizing of knowledge 
about the phenomenon of text itself as. a c6,ngeries of stimulus materials that 
people are exposed. to when they read, ,What people do with these materials 
and the; cortditions under which any such exposure takes place are inescapably 
important components of whatever we .may choose to identify as reading 
comprehension. My argument "i's that the identification aid comparative analysis 
of texts in terms of their ^tructuraT properties is as inescapably Important 
to us if we are to make better sense out of their readers. 
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